skip to main content


Search for: All records

Creators/Authors contains: "Zhang, Jun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, aPhasedErrorCorrection andAssemblyTool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly onB. taurus(Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.

     
    more » « less
  2. Free, publicly-accessible full text available July 1, 2024
  3. Abstract

    Integrated hydrological modeling is an effective method for understanding interactions between parts of the hydrologic cycle, quantifying water resources, and furthering knowledge of hydrologic processes. However, these models are dependent on robust and accurate datasets that physically represent spatial characteristics as model inputs. This study evaluates multiple data‐driven approaches for estimating hydraulic conductivity and subsurface properties at the continental‐scale, constructed from existing subsurface dataset components. Each subsurface configuration represents upper (unconfined) hydrogeology, lower (confined) hydrogeology, and the presence of a vertical flow barrier. Configurations are tested in two large‐scale U.S. watersheds using an integrated model. Model results are compared to observed streamflow and steady state water table depth (WTD). We provide model results for a range of configurations and show that both WTD and surface water partitioning are important indicators of performance. We also show that geology data source, total subsurface depth, anisotropy, and inclusion of a vertical flow barrier are the most important considerations for subsurface configurations. While a range of configurations proved viable, we provide a recommended Selected National Configuration 1 km resolution subsurface dataset for use in distributed large‐and continental‐scale hydrologic modeling.

     
    more » « less
    Free, publicly-accessible full text available October 18, 2024
  4. Abstract

    This study synthesizes two different methods for estimating hydraulic conductivity (K) at large scales. We derive analytical approaches that estimate K and apply them to the contiguous United States. We then compare these analytical approaches to three‐dimensional, national gridded K data products and three transmissivity (T) data products developed from publicly available sources. We evaluate these data products using multiple approaches: comparing their statistics qualitatively and quantitatively and with hydrologic model simulations. Some of these datasets were used as inputs for an integrated hydrologic model of the Upper Colorado River Basin and the comparison of the results with observations was used to further evaluate the K data products. Simulated average daily streamflow was compared to daily flow data from 10 USGS stream gages in the domain, and annually averaged simulated groundwater depths are compared to observations from nearly 2000 monitoring wells. We find streamflow predictions from analytically informed simulations to be similar in relative bias and Spearman's rho to the geologically informed simulations.R‐squared values for groundwater depth predictions are close between the best performing analytically and geologically informed simulations at 0.68 and 0.70 respectively, with RMSE values under 10 m. We also show that the analytical approach derived by this study produces estimates of K that are similar in spatial distribution, standard deviation, mean value, and modeling performance to geologically‐informed estimates. The results of this work are used to inform a follow‐on study that tests additional data‐driven approaches in multiple basins within the contiguous United States.

     
    more » « less
    Free, publicly-accessible full text available September 29, 2024
  5. Key Points Lateral entrainment of air from the moat region into eyewall and rainbands of a tropical cyclone (TC) satisfies the instability criterion Positive buoyancy flux induced by the entrainment is an important source of turbulent kinetic energy for the eyewall and rainband clouds Lateral entrainment instability should be included in turbulent mixing parameterizations in TC forecast models 
    more » « less
    Free, publicly-accessible full text available April 28, 2024
  6. Abstract The momentum roughness length ( z 0 ) significantly impacts wind predictions in weather and climate models. Nevertheless, the impacts of z 0 parameterizations in different wind regimes and various model configurations on the hurricane size, intensity, and track simulations have not been thoroughly established. To bridge this knowledge gap, a comprehensive analysis of 310 simulations of 10 real hurricanes using the Weather Research and Forecasting (WRF) Model is conducted in comparison with observations. Our results show that the default z 0 parameterizations in WRF perform well for weak (category 1–2) hurricanes; however, they underestimate the intensities of strong (category 3–5) hurricanes. This finding is independent of model resolution or boundary layer schemes. The default values of z 0 in WRF agree with the observational estimates from dropsonde data in weak hurricanes while they are much larger than observations in strong hurricanes regime. Decreasing z 0 close to the values of observational estimates and theoretical hurricane intensity models in high wind regimes (≳45 m s −1 ) led to significant improvements in the intensity forecasts of strong hurricanes. A momentum budget analysis dynamically explained why the reduction of z 0 (decreased surface turbulent stresses) leads to stronger simulated storms. 
    more » « less
  7. Free, publicly-accessible full text available May 1, 2024
  8. Abstract

    Airborne Doppler radar observations of the wind field in the tropical cyclone boundary layer (TCBL) during the landfall of Hurricane Ida (2021) are examined here. Asymmetries in tangential and radial flow are governed by tropical cyclone (TC) motion and vertical wind shear prior to landfall, while frictional effects dominate the asymmetry location during landfall. Strong TCBL inflow on the offshore‐flow side of Ida occurs during landfall, while the location of the peak tangential wind at the top of the TCBL during this period is located on the onshore‐flow side. A comparison of these observations with a numerical simulation of TC landfall shows many consistencies with the modeling study, though there are some notable differences that may be related to differences in the characteristics of the land surface between the simulation and the observations here.

     
    more » « less
  9. Abstract

    Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.

     
    more » « less